Pesquisa | Portal Regional da BVS

Federated Learning of Generalized Linear Causal Networks.

Ye, Qiaoling; Amini, Arash A; Zhou, Qing.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Mar 26.

Artigo em Inglês | MEDLINE | ID: mdl-38530737

RESUMO

Causal discovery, the inference of causal relations among variables from data, is a fundamental problem of science. Nowadays, due to an increased awareness of data privacy concerns, there has been a shift towards distributed data collection, processing and storage. To meet the pressing need for distributed causal discovery, we propose a novel federated DAG learning method called distributed annealing on regularized likelihood score (DARLS) to learn a causal graph from data stored on multiple clients. DARLS simulates an annealing process to search over the space of topological sorts, where the optimal graphical structure compatible with a sort is found by distributed optimization. This distributed optimization relies on multiple rounds of communication between local clients and a central server to estimate the graphical structure. We establish its convergence to the solution obtained by an oracle with access to all the data. To the best of our knowledge, DARLS is the first distributed method for learning causal graphs with such finite-sample oracle guarantees. To establish the consistency of DARLS, we also derive new identifiability results for causal graphs parameterized by generalized linear models, which could be of independent interest. Through extensive simulation studies and a real-world application, we show that DARLS outperforms existing federated learning methods and is comparable to oracle methods on pooled data, demonstrating its great advantages in estimating causal networks from distributed data.

Optimizing Regularized Cholesky Score for Order-Based Learning of Bayesian Networks.

Ye, Qiaoling; Amini, Arash A; Zhou, Qing.

IEEE Trans Pattern Anal Mach Intell ; 43(10): 3555-3572, 2021 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-32340938

RESUMO

Bayesian networks are a class of popular graphical models that encode causal and conditional independence relations among variables by directed acyclic graphs (DAGs). We propose a novel structure learning method, annealing on regularized Cholesky score (ARCS), to search over topological sorts, or permutations of nodes, for a high-scoring Bayesian network. Our scoring function is derived from regularizing Gaussian DAG likelihood, and its optimization gives an alternative formulation of the sparse Cholesky factorization problem from a statistical viewpoint. We combine simulated annealing over permutation space with a fast proximal gradient algorithm, operating on triangular matrices of edge coefficients, to compute the score of any permutation. Combined, the two approaches allow us to quickly and effectively search over the space of DAGs without the need to verify the acyclicity constraint or to enumerate possible parent sets given a candidate topological sort. The annealing aspect of the optimization is able to consistently improve the accuracy of DAGs learned by greedy and deterministic search algorithms. In addition, we develop several techniques to facilitate the structure learning, including pre-annealing data-driven tuning parameter selection and post-annealing constraint-based structure refinement. Through extensive numerical comparisons, we show that ARCS outperformed existing methods by a substantial margin, demonstrating its great advantage in structure learning of Bayesian networks from both observational and experimental data. We also establish the consistency of our scoring function in estimating topological sorts and DAG structures in the large-sample limit. Source code of ARCS is available at https://github.com/yeqiaoling/arcs_bn.

On the properties of the toxicity index and its statistical efficiency.

Razaee, Zahra S; Amini, Arash A; Diniz, Márcio A; Tighiouart, Mourad; Yothers, Greg; Rogatko, André.

Stat Med ; 40(6): 1535-1552, 2021 03 15.

Artigo em Inglês | MEDLINE | ID: mdl-33345351

RESUMO

Cancer clinical trials typically generate detailed patient toxicity data. The most common measure used to summarize patient toxicity is the maximum grade among all toxicities and it does not fully represent the toxicity burden experienced by patients. In this article, we study the mathematical and statistical properties of the toxicity index (TI), in an effort to address this deficiency. We introduce a total ordering, (T-rank), that allows us to fully rank the patients according to how frequently they exhibit toxicities, and show that TI is the only measure that preserves the T-rank among its competitors. Moreover, we propose a Poisson-Limit model for sparse toxicity data. Under this model, we develop a general two-sample test, which can be applied to any summary measure for detecting differences among two population of toxicity data. We derive the asymptotic power function of this class as well as the asymptotic relative efficiency (ARE) of the members of the class. We evaluate the ARE formula empirically and show that if the data are drawn from a random Poisson-Limit model, the TI is more efficient, with high probability, than the maximum and the average summary measures. Finally, we evaluate our method on clinical trial toxicity data and show that TI has a higher power in detecting the differences in toxicity profile among treatments. The results of this article can be applied beyond toxicity modeling, to any problem where one observes a sparse array of scores on subjects and a ranking based on extreme scores is desirable.

Assuntos

Neoplasias , Humanos , Projetos de Pesquisa

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA